The study of stability and sensitivity of statistical methods or algorithms with respect to their data is an important problem in machine learning and statistics. The performance of the algorithm under resampling of the data is a fundamental way to measure its stability and is closely related to generalization or privacy of the algorithm. In this paper, we study the resampling sensitivity for the principal component analysis (PCA). Given an $ n \times p $ random matrix $ \mathbf{X} $, let $ \mathbf{X}^{[k]} $ be the matrix obtained from $ \mathbf{X} $ by resampling $ k $ randomly chosen entries of $ \mathbf{X} $. Let $ \mathbf{v} $ and $ \mathbf{v}^{[k]} $ denote the principal components of $ \mathbf{X} $ and $ \mathbf{X}^{[k]} $. In the proportional growth regime $ p/n \to \xi \in (0,1] $, we establish the sharp threshold for the sensitivity/stability transition of PCA. When $ k \gg n^{5/3} $, the principal components $ \mathbf{v} $ and $ \mathbf{v}^{[k]} $ are asymptotically orthogonal. On the other hand, when $ k \ll n^{5/3} $, the principal components $ \mathbf{v} $ and $ \mathbf{v}^{[k]} $ are asymptotically colinear. In words, we show that PCA is sensitive to the input data in the sense that resampling even a negligible portion of the input may completely change the output.
translated by 谷歌翻译
Temporal reasoning is the task of predicting temporal relations of event pairs with corresponding contexts. While some temporal reasoning models perform reasonably well on in-domain benchmarks, we have little idea of the systems' generalizability due to existing datasets' limitations. In this work, we introduce a novel task named TODAY that bridges this gap with temporal differential analysis, which as the name suggests, evaluates if systems can correctly understand the effect of incremental changes. Specifically, TODAY makes slight context changes for given event pairs, and systems need to tell how this subtle contextual change will affect temporal relation distributions. To facilitate learning, TODAY also annotates human explanations. We show that existing models, including GPT-3, drop to random guessing on TODAY, suggesting that they heavily rely on spurious information rather than proper reasoning for temporal predictions. On the other hand, we show that TODAY's supervision style and explanation annotations can be used in joint learning and encourage models to use more appropriate signals during training and outperform across several benchmarks. TODAY can also be used to train models to solicit incidental supervision from noisy sources such as GPT-3 and moves farther towards generic temporal reasoning systems.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Practices in the built environment have become more digitalized with the rapid development of modern design and construction technologies. However, the requirement of practitioners or scholars to gather complicated professional knowledge in the built environment has not been satisfied yet. In this paper, more than 80,000 paper abstracts in the built environment field were obtained to build a knowledge graph, a knowledge base storing entities and their connective relations in a graph-structured data model. To ensure the retrieval accuracy of the entities and relations in the knowledge graph, two well-annotated datasets have been created, containing 2,000 instances and 1,450 instances each in 29 relations for the named entity recognition task and relation extraction task respectively. These two tasks were solved by two BERT-based models trained on the proposed dataset. Both models attained an accuracy above 85% on these two tasks. More than 200,000 high-quality relations and entities were obtained using these models to extract all abstract data. Finally, this knowledge graph is presented as a self-developed visualization system to reveal relations between various entities in the domain. Both the source code and the annotated dataset can be found here: https://github.com/HKUST-KnowComp/BEKG.
translated by 谷歌翻译
Recent breakthroughs in semi-supervised semantic segmentation have been developed through contrastive learning. In prevalent pixel-wise contrastive learning solutions, the model maps pixels to deterministic representations and regularizes them in the latent space. However, there exist inaccurate pseudo-labels which map the ambiguous representations of pixels to the wrong classes due to the limited cognitive ability of the model. In this paper, we define pixel-wise representations from a new perspective of probability theory and propose a Probabilistic Representation Contrastive Learning (PRCL) framework that improves representation quality by taking its probability into consideration. Through modelling the mapping from pixels to representations as the probability via multivariate Gaussian distributions, we can tune the contribution of the ambiguous representations to tolerate the risk of inaccurate pseudo-labels. Furthermore, we define prototypes in the form of distributions, which indicates the confidence of a class, while the point prototype cannot. Moreover, we propose to regularize the distribution variance to enhance the reliability of representations. Taking advantage of these benefits, high-quality feature representations can be derived in the latent space, thereby the performance of semantic segmentation can be further improved. We conduct sufficient experiment to evaluate PRCL on Pascal VOC and CityScapes to demonstrate its superiority. The code is available at https://github.com/Haoyu-Xie/PRCL.
translated by 谷歌翻译
最近,视觉变压器及其变体在人类和多视图人类姿势估计中均起着越来越重要的作用。将图像补丁视为令牌,变形金刚可以对整个图像中的全局依赖项进行建模或其他视图中的图像。但是,全球关注在计算上是昂贵的。结果,很难将这些基于变压器的方法扩展到高分辨率特征和许多视图。在本文中,我们提出了代币螺旋的姿势变压器(PPT)进行2D人姿势估计,该姿势估计可以找到粗糙的人掩模,并且只能在选定的令牌内进行自我注意。此外,我们将PPT扩展到多视图人类姿势估计。我们建立在PPT的基础上,提出了一种新的跨视图融合策略,称为人类区域融合,该策略将所有人类前景像素视为相应的候选者。可可和MPII的实验结果表明,我们的PPT可以在减少计算的同时匹配以前的姿势变压器方法的准确性。此外,对人类360万和滑雪姿势的实验表明,我们的多视图PPT可以有效地从多个视图中融合线索并获得新的最新结果。
translated by 谷歌翻译
来自计算机断层扫描血管造影(CTA)的肾脏结构分割对于许多计算机辅助的肾脏癌治疗应用至关重要。肾脏解析〜(KIPA 2022)挑战旨在建立细粒度的多结构数据集并改善多个肾脏结构的分割。最近,U-NET主导了医疗图像分割。在KIPA挑战中,我们评估了几个U-NET变体,并选择了最终提交的最佳模型。
translated by 谷歌翻译
视觉变压器(VIT)正在出现,并且在计算机视觉任务中的准确性显着提高。但是,它们的复杂架构和巨大的计算/存储需求对新硬件加速器设计方法施加了紧迫的需求。这项工作提出了基于提议的混合速度量化的FPGA感知自动VIT加速框架。据我们所知,这是探索模型量化的第一个基于FPGA的VIT加速框架。与最先进的VIT量化工作(仅无硬件加速的算法方法)相比,我们的量化在相同的位宽度下可实现0.47%至1.36%的TOP-1精度。与32位浮点基线FPGA加速器相比,我们的加速器在框架速率上的提高约为5.6倍(即56.8 fps vs. 10.0 fps),对于DeitBase的ImagEnet数据集,精度下降了0.71%。
translated by 谷歌翻译
凝视估计是一种确定一个人在何处看着该人的脸的方法,是理解人类意图的宝贵线索。与其他计算机视觉领域类似,深度学习(DL)方法在凝视估计域中获得了认可。但是,凝视估计域中仍然存在凝视校准问题,从而阻止了现有方法进一步改善性能。一个有效的解决方案是直接预测两只人眼的差异信息,例如差异网络(DIFF-NN)。但是,此解决方案仅使用一个推理图像时会导致准确性丧失。我们提出了一个差异残差模型(DRNET)与新的损失函数相结合,以利用两个眼睛图像的差异信息。我们将差异信息视为辅助信息。我们主要使用两个公共数据集(1)mpiigaze和(2)Eyediap评估了提出的模型(DRNET)。仅考虑眼睛功能,DRNET分别使用Mpiigigaze和EyeDiap数据集以$ Angular-Error $为4.57和6.14的最先进的目光估计方法。此外,实验结果还表明,DRNET对噪声图像非常强大。
translated by 谷歌翻译
使用机器学习来求解组合优化(CO)问题是具有挑战性的,尤其是当数据未标记时。这项工作为CO问题提供了无监督的学习框架。我们的框架遵循标准的放松加能方法,并采用神经网络来参数放松的解决方案,以便简单的后传播可以端到端训练模型。我们的关键贡献是,观察到,如果放松的目标满足入门凹度,那么低优化损失就可以保证最终积分解决方案的质量。该观察结果显着扩大了受ERDOS概率方法启发的先前框架的适用性。特别是,该观察结果可以指导目标模型的设计,在这些应用程序中未明确给出目标,同时需要在先验中进行建模。我们通过解决合成图优化问题以及两个现实世界应用程序来评估我们的框架,包括电路设计中的资源分配和近似计算。我们的框架在很大程度上优于基于Na \“ {i}的放松,增强学习和Gumbel-Softmax技巧的基线。
translated by 谷歌翻译